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DETAILED ACTION 

1 . This final office action is in response to the amendment filed 1 November 2006 
and the Petition Decision of 22 January 2007. 

2. Claims 8, 11-15, 17-23, and 26-29 are pending. Claims 8, 15, 20, and 23 are 
independent claims. Claims 1, 3-7, 10, 16, and 25 have been cancelled. 

Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a 
person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived 
by the manner in which the invention was made. 

3. Claims 8, 11-15, 17-23, and 26-29 remain rejected under 35 U.S.C. 103(a) as 
being unpatentable over Lantrip et al. (USPN 6,298,174 B1— filing date 10/15/1999), 
hereinafter Lantrip, further in view of Ruocco et al. (USPN 5,864,855 — filing date 
2/26/1 996), hereinafter Ruocco. 

Regarding independent claim 8, Lantrip discloses a method of clustering 
documents in datasets (in col. 2, lines 39-42, document vectors are arranged into 
clusters) comprising: clustering first documents in a first dataset to produce first 
document classes; (in col. . 2, lines 39-42, document vectors are arranged into clusters), 
and creating centroid seeds based on said first document classes (in col. 2, lines 43-45, 
the invention finds centroids). However, Lantrip fails to disclose clustering second 
documents in a second dataset using said centroid seeds. However, in col. 14, lines 
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10-45 of Ruocco, Ruocco discloses in the claim processing in parallel second datasets 
based on cluster information from previous cluster vectors (see col. 14, lines 28-30) in 
order to gain the benefit of information from previous clusters to improve analysis of 
subsequent datasets. Ruocco's invention further may be interpreted such that said 
second dataset has a similar clustering to that of said first dataset (as the term "similar" 
is sufficiently broad that any two given datasets would have some degree of similarity, 
see 35 U.S.C. 112 rejection, above.), further wherein said second dataset comprises a 
new, but related dataset different than said first dataset (once the first dataset is 
transformed, it is by definition a new, but related dataset). It would have been obvious 
to one of ordinary skill in the art at the time of the invention to use the information 
contained in the centroid seeds from Lantrip for subsequent datasets as in Ruocco in 
order to improve analysis of subsequent datasets. 

Regarding dependent claim 1 1 , Lantrip fails to disclose a method further 
comprising generating a second vector space model by counting, for each word in said 
first dictionary, a number of said second document in which said word occurs. 
However, Ruocco, in col. 14, lines 20-35, discloses generating such a vector space 
model for multiple document sets in order to aid in the clustering analysis of the 
document sets. It would have been obvious to one of ordinary skill in the art at the time 
of the invention to generate a second vector space model in the manner of Ruocco in 
Lantrip's invention in order to aid in the clustering analysis of the document sets.. 

Regarding dependent claim 12, Lantrip discloses that said creating of said 
centroid seeds comprises: classifying said second vector space model using said first 
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document classes to produce a classified second vector space model (col. 2, lines 39- 
42, the vector space model is clustered); and determining a mean of vectors in each 
class in said classified second vector space model, wherein said mean comprises said 
centroid seeds (col. 2, lines 43-45, the centroid is the center of mass of the clusters). 

Regarding dependent claim 13, Lantrip and Ruocco fail to disclose a method 
further comprising forming a second dictionary of most common words in said second 
dataset; generating a third vector space model by counting, for each word in said 
second dictionary, a number of said second documents in which said word occurs; and 
clustering said documents in said second dataset based on said third vector space 
model to produce a second dataset cluster. However, this constitutes simply extending 
and repeating claim 3 to a third dataset, and it was notoriously well known in the art at 
the time of the invention that it is useful to repeat steps for multiple datasets to take 
advantage of their utility for subsequent data. It would have been obvious to one of 
ordinary skill in the art at the time of the invention to extend the steps of claim 3 to a 
subsequent dataset to gain the benefits of the analysis for that dataset. 

Regarding dependent claim 14, Lantrip discloses in col. 2, lines 39-45 that 
clustering of said documents in said dataset using said centroid seeds produces an 
adapted dataset cluster. However, Lantrip fails to disclose the use of multiple datasets 
and that the method further comprises comparing classes in said adapted dataset 
cluster to classes in said second dataset cluster; and adding classes to said adapted 
dataset cluster based on said comparing. However, in col. 4, lines 61-67, Rocco deals 
with comparing multiple dataset clusters in order to obtain more information about the 
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relative status of the datasets. It would have been obvious to one of ordinary skill in the 
art at the time of the invention to compare multiple dataset clusters in order to obtain 
more information about the relative status of the datasets. 

Regarding independent claim 15, it is essentially analogous to claim 1 except 
that it involves the steps of generating a vector space model of said second documents, 
which Ruocco presents in col. 14, lines 27-36, and classifying said vector space model 
of said second documents using said first document classes to produce a classified 
vector space model, which Ruocco presents in col. 14, lines 27-36. It would have been 
obvious to one of ordinary skill in the art at the time of the invention to use the Ruocco 
form of vector space analysis in addition to the Lantrip material from the rejection of 
Claim 1 in order to enhance the classifications of the two datasets. The* result would 
produce an invention that would serve to reject claim 15. 

Regarding dependent claim 17, the applicant discloses the limitations 
substantially similar to those in claim 11. Claim 17 is similarly rejected. 

Regarding dependent claim 18, the applicant discloses the limitations 
substantially similar to those in claim 13. Claim 18 is similarly rejected. 

Regarding dependent claim 19, the applicant discloses the limitations 
substantially similar to those in claim 14. Claim 18 is similarly rejected. 

Regarding independent claim 20, Lantrip discloses a method of clustering 
documents comprising: forming a first dictionary of most common words in a first 
dataset (col. 2, lines 30-35, Lantrip forms a first dictionary of common words); 
generating a first vector space model by counting, for each word in said first dictionary, 
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a number of said first documents in which said words occurs (col. 2, lines 35-40, Lantrip 
forms vectors); clustering said first documents in said first dataset based on said first 
vector space model to produce first document classes (col. 2, lines 39 7 42, Lantrip forms 
clusters), and determining a mean of vectors in each class in said classified second 
vector space model to produce centroid seeds; (col. 2, lines 43-45, Lantrip forms 
centroid seeds) and clustering documents in a second datasets using said centroid 
seeds (col. 2, lines 45-57, Lantrip clusters using centroids). Lantrip fails to disclose 
generating a second vector space model by counting, for each word in said first 
dictionary, and number of said second documents in which said word occurs and 
classifying said second documents in said second vector space model using said first 
document classes to produce a classified second vector space model. However, col. 
14, lines 28-36 of Ruocco indicate that vector clustering analysis may involve multiple 
datasets in order to gain the benefit of information analysis from multiple sources. It 
would have been obvious to one of ordinary skill in the art at the time of the invention to 
have vector clustering analysis involve multiple datasets in order to gain the benefit of 
information analysis from multiple sources. 

Regarding dependent claim 21 , the applicant discloses the limitations 
substantially similar to those in claim 13. Claim 21 is similarly rejected. 

Regarding dependent claim 22, the applicant discloses the limitations 
substantially similar to those in claim 14. Claim 22 is similarly rejected. 

Regarding independent claim 23, , the applicant discloses the limitations 
substantially similar to those in claim 8. Claim 23 is similarly rejected. 
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Regarding dependent claim 26, the applicant discloses the limitations 
substantially similar to those in claim 1 1 . Claim 26 is similarly rejected. 

Regarding dependent claim 27, the applicant discloses the limitations 
substantially similar to those in claim 12. Claim 27 is similarly rejected. 

Regarding dependent claim 28, the applicant discloses the limitations 
substantially similar to those in claim 13. Claim 28 is similarly rejected. 

Regarding dependent claim 29, the applicant discloses the limitations 
substantially similar to those in claim 14. Claim 29 is similarly rejected. 



Response to Arguments 

4. Applicant's arguments filed 1 November 2006 have been fully considered but 
they are not persuasive. 

The applicant argues that Ryocco fails to teach that the second data set has a 
similar, based on said centrpid seeds, clustering to that of said first dataset (page 10). 
The examiner respectfully disagrees. Ryocco suggests clustering second documents in 
a second dataset using said centroid seeds, such that said second dataset has a similar 
clustering to that of said first dataset (pages 9-10). Although, the applicant argues that 
Ryocco uses the centroid seeds of the first dataset, the claim limitations require "using 
said centroid seeds (claim 8, line 15; emphasis added)." Although these centroid seeds 
may be used with the first document, the applicant's plain claim language restricts using 
a second set of centroid seeds, and instead requires the original centroid seeds be 
used). 
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Conclusion 

5. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1 . 1 36(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Kyle R. Stork whose telephone number is (571) 272- 
4130. The examiner can normally be reached on Monday-Friday (8:00-4:30). 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Stephen Hong can be reached on (571) 272-4124. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 

Patent Application Information Retrieval (PAIR) system. Status information for 

published applications may be obtained from either Private PAIR or Public PAIR. 

Status information for unpublished applications is available through Private PAIR only. 

For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 

you have questions on access to the Private PAIR system, contact the Electronic 

Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 

USPTO Customer Service Representative or access to the automated information 

system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

Kyle R Stork 
Patent Examiner 
Art Unit 2178 
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